CLASSIFICATION METRICS.

Theoritical Understanding

Where we are going to use the classification metrics?

  • It is used whenever we have a classification problem and we need to know how well our model is performing, and how well it is doing it.

Types of Performance Metrics:

  • Accuracy: It is the percentage of correct predictions.

    • met1
  • Precision: In the total predicted positive value what is the percentage of correct positive predictions.

  • Is better to use when we want to be very sure of our prediction.

    • met2
  • Recall: In the total actual positive value what is the percentage of actual correct positive predictions.

  • This is a good metric to use when we want to capture as many positives as possible. For example: If we are building a system to predict if a person has cancer or not, we want to capture the disease even if we are not very sure.

    • met3
  • F-Beta Score: It is the weighted average of precision and recall.

    • met10

    • F1 Score: It is the harmonic mean of precision and recall. When False Positive and False Negative are both important, then beta is 1.

    • This metric maintains a balance between these two, so if the precision is low, the F1 is low, and if the recall is low again the F1 score is low. It is suited for imbalanced class distributions problems.

      • met4
    • F0.5 Score: When False Positive impact is high then beta is usually choosen as 0.5

    • F2 Score: When false Negative impact is high we choose beta value as 2

  • AUC: It is the area under the ROC curve.

    • met5
  • ROC Curve: It is the curve that shows the performance of the model.

    • met6
  • Confusion Matrix: It is a table that shows the number of correct and incorrect predictions.

    • met7

      • True Positive: The number of correct predictions that are positive when actually positive.

      • True Negative: The number of correct predictions that are negative when actually negative.

      • False Positive(Type I Error): The number of incorrect predictions that are positive when actually they are negative.

      • False Negative(Type II Error): The number of incorrect predictions that are negative when actually they are positive.

When we use Accuracy?

  • It is used when we have a dataset that is balanced and we want to know how well our model is performing.

Why we should not use Accuracy in imbalanced dataset?

  • Let us take a scenario where our dataset is imbalanced and we are using accuracy as our metric.
  • We have a dataset having a target value as class A and B. Then we have done a train test split so our training data have 1200 data in a class A catagory and 200 in a class B catagory.
  • We know in case of imbalance dataset model is biased towards a catagory that is more present in the dataset. So in our case our model is biased towards class A.
  • In this case when we use accuracy as our metric we are getting a high accuracy because all the prediction is True Positive due to bias.
  • So we should avoid using accuracy in imbalanced dataset.

What Metric should we use in case of imbalanced dataset?

  • We should use Precision and Recall.

When should we use Precision and Recall?

  • Let’s take a scenario where we are doing spam mail detection.
  • In this case if our model predicted a non spam mail as spam then it is a false positive. So the user could miss the actual mail.
  • So in this case Precision is much more important than Recall.

  • Let’s take a scenario where we are doing classification where we should classify whether a person is having cancer or not.
  • In this case if our model predicted a person having cancer as not having cancer then it is a false negative. So the user could miss the actual cancer and this is a disaster. In this scenario Recall is much more important than Precision.

What is TPR and FPR?

  • TPR is the percentage of true positive and also called sensitivity and given as:
    • met8
  • FPR is the percentage of false positive and also called specificity and given as:
    • met9

INTERVIEW QUESTIONS

  • 1. Tell me the scenario when False Positive is more important than False Negative?

    • In the case of spam mail detection. Here if a non spam mail is classified as spam then user could miss the important mail.

  • 1. Tell me the scenario when False Negative is more important than False Positive?

    • When we are classifying a patient if they have a cancer or not. Here if a person has cancer then we should classify them as having cancer. But if a model predict a person does not have a cancel but in actual they have cancer then it worse case scenario.

  • 2. If they have give you a option, do you go with model accuracy or model performance?

    • Suppose we are processing medical images to detect cancer cells where accuracy metric plays a very vital role. There isn’t any time constraint. Here the cost of error is very high. So model accuracy, in this case, should be excellent and moreover it is expected to get near about 100% accuracy.

    • In computer vision, real-time face detection has to be very fast. They don’t demand very high accuracy. So performance/speed should be considered in place of accuracy.

  • 3. How is AUC - ROC curve used in classification problems?

    • A ROC (receiver operating characteristic curve) is used to measure the performance of a classification model at various classification thresholds. This lets us essentially separates the signal from the noise using the Area Under the Curve (AUC) as the measure of the ability of a classifier to distinguish between classes. Its values are between 0 and 1 and are used as a summary of the ROC curve.

    • If AUC = 1 the classifier can perfectly distinguish between all the Positive and Negative class points correctly.

    • If 0.5 < AUC < 1 there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is so because the classifier can detect more numbers of True positives and True negatives than False negatives and False positives.

    • If AUC = 0.5, then the classifier is not able to distinguish between Positive and Negative class points. Meaning either the classifier is predicting a random class or a constant class for all the data points.

    • If AUC = 0 then the classifier would be predicting all Negatives as Positives, and all Positives as Negatives.

    • In summary, the higher the AUC value for a classifier, the better its ability to distinguish between positive and negative classes.

  • 4. How would you choose an evaluation metric for an Imbalanced classification?

    • The first approach is to talk to project stakeholders and figure out what is important about the particular model. Then we can select a few metrics that seem to capture what is important, and test the metric with different scenarios. We can test what happens to the metric if a model predicts all the majority class, all the minority class, if does well, if does poorly, and so on. With a few small tests, we can get a feeling for how the metric might perform.

    • Another approach might be to perform a literature review and discover what metrics are most commonly used by other practitioners or academics working on the same general type of problem.

    • We can get more insight from following pictures:

      • met11